class: center, middle, inverse, title-slide .title[ # Lecture 10: Spatial Data in R ] .subtitle[ ## Getting Started with sf ] .author[ ### James Sears
AFRE 891 SS 24
Michigan State University ] .date[ ### .small[
*Parts of these slides are adapted from
“Advanced Data Analytics”
by Nick Hagerty and
“R Geospatial Fundamentals”
by the UC Berkeley D-Lab used under
CC BY-NC-SA 4.0
.] ] --- <style type="text/css"> # CSS for including pauses in printed PDF output (see bottom of lecture) @media print { .has-continuation { display: block !important; } } .remark-code-line { font-size: 95%; } .small { font-size: 75%; } .scroll-output-full { height: 90%; overflow-y: scroll; } .scroll-output-75 { height: 75%; overflow-y: scroll; } </style> # Table of Contents: Today **Part 1: Getting Started** 1. [Intro to Spatial Data in R and the `sf` package](#intro) 1. [Quick Mapping](#quickmap) 1. [Reference Systems and Projections](#crs) --- # Table of Contents: Later **Part 2: Vector Data Manipulation** 1. Spatial Queries 1. Spatial Subsetting 1. Geometric Operations 1. Spatial Joins **Part 3: Raster Data** 1. Common Raster Data Sources 1. Raster Operations 1. Combining Raster and Vector Data --- class: inverse, middle name: intro # Spatial Data in R and the sf Package --- # GIS The most widespread .hi-medgrn[geographic information system] is .hi-blue[ArcGIS.] -- .hi-blue[Advantages of ArcGIS:] * Avoid coding. * Interface for browsing and exploring data is incredibly comprehensive and fast. -- .hi-green[Why we're using R instead:] - Free. - Reproducible. - Scriptable. - Easily integrated with the rest of your project. - Easy to export attractive, professional maps. - Honestly, easier if you know some R already. --- # sf: Simple Features `sf` is the main package for working with .hi-medgrn[vector] data in R. .center[ <img src="data:image/png;base64,#images/sf.jpg" width = "50%"/>] -- * .hi-blue[features:] things with a spatial location/extent * .hi-pink[simple:] linestrings and polygons are built from points (nodes) connected by straight lines (edges) * Integrates cleanly with .hi-slate[tidyverse] * Since v1.0: uses spherical geometry for transformations! .footer[<font size = "1"> Illustration by [Allison Horst](https://twitter.com/allison_horst/status/1071456081308614656) </font>] --- # sf: Simple Features `sf` is the main package for working with .hi-medgrn[vector] data in R. Install and load it (and a couple other packages): ```r if (!require("pacman")) install.packages("pacman") pacman::p_load(sf, tidyverse, tmap) ``` --- # Shapefiles The ESRI .hi-medgrn[shapefile] is the most widely used type of file .hi-medgrn[format for storing geospatial vector data]. A "shapefile" is actually a collection of 3+ files: -- Required files: - `shp`: The main file that stores the feature geometry - `shx`: A positional index for locating the feature geometry in the `shp` file - `dbf`: The data table (in dBase IV format) that stores the attribute information for each feature --- # Shapefiles The ESRI .hi-medgrn[shapefile] is the most widely used type of file .hi-medgrn[format for storing geospatial vector data]. A "shapefile" is actually a collection of 3+ files: Optional files: - `prj`: Stores the coordinate reference system information. (**should be required!**) - `sbn` and `sbx`: spatial index to speed up geometry operations (*used only by ESRI software*) - `xml`: Metadata — Stores information about the shapefile. - `cpg`: Specifies the code page for identifying the character encoding set to be used. All files need to be kept together in the same directory. --- # Loading Shapefile Data List the files: ```r dir("data/MichiganCounties") ``` ``` ## [1] "MichiganCounties.cpg" "MichiganCounties.dbf" "MichiganCounties.prj" ## [4] "MichiganCounties.shp" "MichiganCounties.shx" "MichiganCounties.xml" ``` Load the data with `st_read()`: ```r counties <- st_read("data/MichiganCounties/MichiganCounties.shp") ``` ``` ## Reading layer `MichiganCounties' from data source ## `F:\OneDrive - Michigan State University\Teaching\MSU 2023-2024\AFRE 891 SS24\Lecture-Slides\10-Spatial\data\MichiganCounties\MichiganCounties.shp' ## using driver `ESRI Shapefile' ## Simple feature collection with 83 features and 15 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -90.41829 ymin: 41.69613 xmax: -82.41348 ymax: 48.26269 ## Geodetic CRS: WGS 84 ``` --- # Loading Shapefile Data We can also load the data by pointing to the *folder* containing the shapefile files: ```r counties <- st_read("data/MichiganCounties") ``` ``` ## Reading layer `MichiganCounties' from data source ## `F:\OneDrive - Michigan State University\Teaching\MSU 2023-2024\AFRE 891 SS24\Lecture-Slides\10-Spatial\data\MichiganCounties' ## using driver `ESRI Shapefile' ## Simple feature collection with 83 features and 15 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -90.41829 ymin: 41.69613 xmax: -82.41348 ymax: 48.26269 ## Geodetic CRS: WGS 84 ``` --- # Geodatabases Shapefiles have some .hi-pink[severe limitations]. * They must be less than 2 GB. * Column names cannot be longer than 10 characters. * The number of columns is limited to 255. -- Another, newer file format is called a .hi-purple[geodatabase (.gdb)]. `st_read()` can handle geodatabases with the `layer` argument. * The important thing to keep in mind is that in your computer, the `.gdb` file *appears* to be a folder, but the individual files within it are uninterpretable. * `st_layers()` will show you the list of layers in a geodatabase. --- # sf Object Contents Looking at the structure of the data, we can see that it's an sf object, but *also* a data frame with an extra .hi-medgrn[geometry] column. This is also easy to check in RStudio in the Environment window. .font70[ ```r str(counties) ``` ``` ## Classes 'sf' and 'data.frame': 83 obs. of 16 variables: ## $ OBJECTID : int 1 2 3 4 5 6 7 8 9 10 ... ## $ FIPSCODE : chr "001" "003" "005" "007" ... ## $ FIPSNUM : int 1 3 5 7 9 11 13 15 17 19 ... ## $ NAME : chr "Alcona" "Alger" "Allegan" "Alpena" ... ## $ LABEL : chr "Alcona County" "Alger County" "Allegan County" "Alpena County" ... ## $ TYPE : chr "County" "County" "County" "County" ... ## $ CNTY_CODE : chr "001" "003" "005" "007" ... ## $ SQKM : num 1799 2425 2181 1539 1359 ... ## $ SQMILES : num 694 936 842 594 525 ... ## $ ACRES : num 444428 599194 538923 380383 335744 ... ## $ VER : chr "17A" "17A" "17A" "17A" ... ## $ LAYOUT : chr "landscape" "landscape" "landscape" "landscape" ... ## $ PENINSULA : chr "lower" "upper" "lower" "lower" ... ## $ ShapeSTAre: num 3.56e+09 5.10e+09 4.03e+09 3.08e+09 2.72e+09 ... ## $ ShapeSTLen: num 242638 567352 261622 408568 255627 ... ## $ geometry :sfc_MULTIPOLYGON of length 83; first list element: List of 1 ## ..$ :List of 1 ## .. ..$ : num [1:1414, 1:2] -83.9 -83.9 -83.9 -83.9 -83.9 ... ## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ... ## ..- attr(*, "names")= chr [1:15] "OBJECTID" "FIPSCODE" "FIPSNUM" "NAME" ... ``` ] --- # sf Geometry Looking at the geometry column a bit more reveals that * Each row contains a .hi-blue[feature] * Each feature contains a `simple feature geometry list column (sfc)` * Which in turn contains a `simple feature geometry (sfg)` ```r class(counties$geometry) ``` ``` ## [1] "sfc_MULTIPOLYGON" "sfc" ``` ```r # and for the geometry of the first feature? class(counties$geometry[[1]]) ``` ``` ## [1] "XY" "MULTIPOLYGON" "sfg" ``` --- # sf Geometry Types There are three main types of geometries that can be associated with `sf` object: points, lines and polygons: .center[ <img src ="https://datacarpentry.org/organization-geospatial/fig/dc-spatial-vector/pnt_line_poly.png" width="450"></img>] --- # sf Geometry Types In an `sf data.frame` these geometries are encoded in a format known as [Well-Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry). For example: > - POINT (30 10) > - LINESTRING (30 10, 10 30, 40 40) > - POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) where X,Y coordinate values are separated by a space, coordinate pairs by a comma, and geometries by parentheses --- # sf Geometry Types An `sf` object may also include the variants .hi-blue[multipoints], .hi-medgrn[multilines], and .hi-purple[multipolgyons] if some of the features are composed of multiple geometries: > - MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) > - MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10)) > - MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) -- For example, if we had data representing US states (one per row), we could use * POLYGON geometry for states like Utah or Colorado * MULTIPOLYGON for a state like Hawaii, which includes many islands. --- # sf Geometry Another thing to note is that an sf geometry is .hi-pink[sticky] (Or like Kramer from Seinfeld: it always shows up) ```r head(counties[1:5, 1:4]) ``` ``` ## Simple feature collection with 5 features and 4 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -87.11664 ymin: 42.41882 xmax: -83.19065 ymax: 46.69078 ## Geodetic CRS: WGS 84 ## OBJECTID FIPSCODE FIPSNUM NAME geometry ## 1 1 001 1 Alcona MULTIPOLYGON (((-83.88712 4... ## 2 2 003 3 Alger MULTIPOLYGON (((-87.11602 4... ## 3 3 005 5 Allegan MULTIPOLYGON (((-85.54343 4... ## 4 4 007 7 Alpena MULTIPOLYGON (((-83.3434 44... ## 5 5 009 9 Antrim MULTIPOLYGON (((-84.84877 4... ``` --- # sf Object to Dataframe We can easily convert an sf object to a regular data frame by .hi-medgrn[removing the geometry] ```r counties_df <- st_drop_geometry(counties) class(counties_df) ``` ``` ## [1] "data.frame" ``` --- # dataframe to sf We can also go from a dataframe to an sf object if we have column(s) containing coordinate information. -- Let's add the latitude and longitude of each county's centroid to `counties_df` and convert it to an sf object with points geometry: ```r centroids <- st_centroid(counties) counties_df <- mutate(counties_df, longitude = st_coordinates(centroids)[,1], latitude = st_coordinates(centroids)[,2]) %>% * st_as_sf(coords = c("longitude", "latitude"), * crs = st_crs(counties)) class(counties_df) ``` ``` ## [1] "sf" "data.frame" ``` --- # Switching Geometry Columns Note that an sf object can have multiple geometry columns, but only one can be .hi-blue[active] at a time. We can view and switch the active geometry without deleting columns using `st_geometry()`: ```r counties_comb <- mutate(counties, points = counties_df$geometry) st_geometry(counties_comb) ``` ``` ## Geometry set for 83 features ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -90.41829 ymin: 41.69613 xmax: -82.41348 ymax: 48.26269 ## Geodetic CRS: WGS 84 ## First 5 geometries: ``` --- # Switching Geometry Columns Note that an sf object can have multiple geometry columns, but only one can be .hi-blue[active] at a time. We can view and switch the active geometry without deleting columns using `st_geometry()`: ```r *st_geometry(counties_comb) <- "points" st_geometry(counties_comb) ``` ``` ## Geometry set for 83 features ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: -89.69307 ymin: 41.88767 xmax: -82.68086 ymax: 47.60781 ## Geodetic CRS: WGS 84 ## First 5 geometries: ``` --- class: inverse, middle name: quickmap # Quick Mapping --- # Quick Mapping There are several ways we can quickly make maps in R * base R's `plot()` function * .hislate[tmap] package's `qtm()` function * `ggplot()` and the `geom_sf()` function Later on we'll go more in-depth with how to customize maps --- # Quick Mapping: base R ```r plot(counties) ``` <img src="data:image/png;base64,#Spatial-Intro_files/figure-html/unnamed-chunk-13-1.png" width="75%" style="display: block; margin: auto;" /> This yield a grid of maps where colors correspond to data values of the first 9 data frame columns --- # Quick Mapping: base R To map just a single variable: ```r plot(counties["SQMILES"]) ``` <img src="data:image/png;base64,#Spatial-Intro_files/figure-html/unnamed-chunk-14-1.png" width="80%" style="display: block; margin: auto;" /> This gives a **choropleth** map of county size (in square miles) > A `choropleth` map us a thematic map where data values are used to symbolize polygons --- # Quick Mapping: base R Or just the geometry: ```r plot(counties["geometry"]) ``` <img src="data:image/png;base64,#Spatial-Intro_files/figure-html/unnamed-chunk-15-1.png" width="80%" style="display: block; margin: auto;" /> ```r # alternatively: plot(st_geometry(counties)) # alternatively: plot(counties[["geometry"]]) # alternatively: plot(counties$geometry) ``` --- # Quick Mapping: tmap (static) The .hi-slate[[tmap] package makes is easy to make thematic maps in R, both static and interactive ```r qtm(counties) ``` <img src="data:image/png;base64,#Spatial-Intro_files/figure-html/unnamed-chunk-16-1.png" width="65%" style="display: block; margin: auto;" /> --- # Quick Mapping: tmap (interactive) ```r tmap_mode("view") # the default is mode = "plot" qtm(counties) ```
--- # Quick Mapping: Invalid Polygons errors Sometimes you may get the error `## Error: Shape contains invalid polygons` This can usually be fixed quickly with `st_make_valid()`. ```r counties <- st_make_valid(counties) ``` --- # Data Wrangling Works Normally ```r ingham <- filter(counties, NAME == "Ingham") map_ingham <- tm_shape(counties) + tm_polygons(border.col = "white") + tm_shape(ingham) + tm_borders(col = "green", lwd = 3) map_ingham ```